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Abstract. We study the dynamics of a backtracking procedure capable of proving 
uncolourability of graphs, and calculate its average running time T for sparse random 
graphs, as a function of the average degree c and the number of vertices N. The 
analysis is carried out by mapping the history of the search process onto an out-of- 
cquilibrium (multi-dimensional) surface growth problem. The growth exponent of the 
average running time, u)(c) = (In T)/N, is quantitatively predicted, in agreement with 
simulations. 



PACS numbers: 05.10., 05.70., 89.20. 
1. Introduction 

The wide variety of practical problems that can be mapped onto NP-complete problems, 
together with the challenge in finding an answer to one of the most important open 
questions in theoretical computer science, 'Does NP = PT , have led to intensive 
studies in the past decades. Despite intense efforts, the worst case running times of 
all currently known algorithms grow exponentially with the size of the inputs to these 
problems. However, NP-complete problems are not always hard. They might be even 
easy to solve on average PJE1IH] i-s- when their resolution complexity is measured with 
respect to some underlying probability distribution of instances. This 'aver age- case' 
behavior depends, of course, on the input-distribution. 

In the graph colouring problem, one of the most well-known combinatorial 
optimization problems with applications ranging from time tabling and scheduling |UEj, 
through register allocation El , to frequency assignment jH] , the average-case behavior 
is often defined on random graphs. The aim is to colour the vertices of the graph such 
that no adjacent vertices have the same colour. Whether this can be done with k or less 
than k colours constitutes the so called /c-colouring (fc-COL) decision problem. 2-COL 
is easy and can be decided in a time growing polynomially with the size (number of 
vertices) of the graph, while /c-COL is NP-complete for any fc > 3 H]- We shall 
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restrict to the investigation of 3-COL in the following, and denote by Red (R), Green 
(G) and Blue (B) the available colours. Graphs will be generated according to the 
G(N,p) distribution: they are made of iV vertices, linked two by two through edges 
with probability p. Studies of the 3-COL problem on this ensemble have indicated that, 
for sparse random graphs in which p = c/N, a phase transition between colourable 
and uncolourable phase occurs in the large N limit as the connectivity (average vertex- 
degree) c is varied c ^HIE]- Below a critical value C3 of the connectivity c, almost all 
instances are colourable whereas, above C3, the probability that an instance is colourable 
drops to zero. Determination of C3 is an open question in random graph theory first 
posed by Erdos [12]. Nevertheless, years of investigation have yielded some lower and 
upper bounds for c 3 . Probabilistic counting arguments have led to the best known upper 
bound C3 < 4.99 ^H]. A recent analysis of a "smoothed" version of the Brelaz heuristic 
[T4] has yielded the highest lower bound C3 > 4.03 [T5] . 

In a recent work, Mulet et al. used a mapping of the graph colouring problem onto 
the Potts model, and applied statistical mechanics methods to estimate C3 [IS]. The 
result, c 3 ~ 4.69, is very close to numerical simulations [17 . Below c 3 , solving 3-COL 
can be done by exhibiting a proper colouring, a task carried out by search algorithms 
in an apparently efficient way [TSj. Above C3, resolution of an instance almost surely 
means exhibiting a proof of its uncolourability, a very hard task. One of the most 
popular algorithm capable of exhibiting such proofs is the Davis-Putnam-Logemann- 
Loveland procedure (DPLL) [TH]. Its operation amounts to a clever exhaustive search 
in the configuration space, based on the errors and trials principle. Generally, the time 
needed by DPLL to check the absence of colouring grows exponentially with the size of 
the graph, T ~ exp(N u(c)). The purpose of this paper is to calculate u as a function of 
c. Such a study was recently undertaken for the satisfiability problem ^1120] and vertex 
covering [2*T) l2*U]. both hard decision problems. The interest of 3-COL with respect to the 
latter cases is its intrinsic symmetry. From any proper colouring, five other colourings 
can be deduced through colour permutations. It is therefore interesting to understand 
whether respecting or breaking this symmetry can lead to computational gains, and how 
this can be implemented in the dynamics of the search algorithm 22j. 

Hereafter, we focus on the case of colouring heuristics that do not explicitly break 
the symmetry between colours. The analysis of the biased case is left to a forthcoming 
companion paper [22] • This article is organized as follows. The colouring algorithm is 
presented in section [21 Section E] is devoted to the analysis of the dynamics and of the 
resolution time of the algorithm. In the last section 0] we summarize and propose some 
perspectives. 

2. Description of the Colouring Algorithm 

The algorithm which we analyze in this paper is a complete algorithm capable of 
determining whether a given graph is 3-colourable or not. The algorithm is based on 
a combination of a colouring heuristic, 3-GREEDY-LIST (3-GL), and of backtracking 
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2.1. Operation of the Greedy-List algorithm with backtracking 

The action of the colouring procedure is illustrated on Figure [T] and described as follows: 

• Necessary Information: while running, the algorithm maintains for each uncoloured 
vertices, a list of available colours, which consists of all the colours that can be 
assigned to this vertex given the colours already assigned to surrounding vertices. 

• Colouring Order: the order in which the vertices are coloured, is such that the most 
constrained vertices i.e. with the least number of available colours are coloured 
first. At each time step, a vertex is chosen among the most constrained vertices, 
and its colour is selected from the list of its available colours. Both choices are 
done according to some heuristic rule, which can be unbiased (no preference is 
made between colours), or biased (following a hierarchy between colours), see next 
Section. 

• List-Updating: to ensure that no adjacent vertices have the same colour, whenever 
a vertex is assigned a colour, this colour is removed from the lists (if present) 
attached to each of the uncoloured neighbors. 

• Contradictions and Backtracking: a contradiction occurs as soon as one of the lists 
becomes empty. Then, the algorithm backtracks to the most recently chosen vertex, 
which have more than one available colour (the closest node in the search tree - see 
definition below). 

• Termination Condition: the algorithm stops when all vertices are coloured, or when 
all colouring possibilities have been tried. 

A search tree describes the action of the algorithm is the following, with the 
following components: 

• Node: a node in the tree represents a vertex chosen by the algorithm, which has 
more than one colour in its available-colours-list. 

• Edge: an edge which comes out of a node, corresponds to a possible colour of the 
chosen vertex. 

• Leaf: a branch terminates either by a solution (denoted by S) or by a contradiction 
(denoted by C), depending on whether the colour choices made along this branch 
give a proper colouring of the graph, or not. 

2.2. Colour symmetry: the unbiased 3-GL heuristic 

Let us call 3-GL heuristic the incomplete version of the above algorithm, obtained when 
the algorithm stops if a colouring is found (and outputs "Colourable"), or just after the 
first contradiction instead of backtracking (and outputs "Don't know if colourable or 
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Figure 1. Two examples which demonstrate how the GL algorithm acts onto a 
colourable (left side) and an uncolourable (right side) graph. The figure illustrates 
how the search tree grows with the operation of the algorithm. Available colours at 
each step are denoted by the patterns of the filled circles attached to vertices. When a 
vertex is coloured, it is removed from the graph, together with all its attached edges. 
In addition, the chosen colour is removed from the neighours' sets of available colours. 
On the left side of the figure, a colourable graph is coloured by the algorithm. No 
contradiction is encountered, and the algorithm finds a solution without backtracking. 
On the right side, the algorithm tries to colour an uncolourable graph. When it first 
hits a contradiction (step 2) i.e. when two 1-colour vertices connected by an edge 
are left with the same available colour, the algorithm backtracks to the last-coloured 
vertex, and tries to colour it with the second available colour. When a contradiction 
is hit again, the algorithm terminates. Note, that in principle, it could backtrack to 
the first-coloured node, and try other colour options. However, due to colour gauge 
symmetry, this will not yield a solution. 
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not"). In contrast to 3-GL algorithm with backtracking, the 3-GL heuristic is not able 
to prove the absence of solution, and is amenable to rigorous analysis |15j . 

In the simplest case, vertices and colours are chosen purely randomly without any 
bias between colours (Colouring Order step described above). This 'symmetric' 3-GL 
heuristic verifies two key properties which our analysis rely on. The first one is a 
statistical invariance called R-property. Throughout the execution of the algorithm, the 
uncoloured part of the graph is distributed as G((l — t)N,p) where t is the number of 
coloured vertices divided by N. The second property is colour symmetry. The search 
heuristic is symmetric with respect to the different colours, and the initial conditions 
are symmetric as well. Denoting by / = {R, G, B} the list of the three available colours, 
a 2-colour node can have one of three possible lists {R, G},{R, B},{G, B} and similarly, 
there are three possible lists for a 1-colour node. Due to colour symmetry, in the limit 
of large N, we expect the groups of 1-colour and 2-colour vertices to be composed of an 
equal number of vertices (with o(N) fluctuations) with the three kinds of lists. Hence, 
in the leading order, the evolution of the algorithm can be expressed by the evolution of 
the three numbers Nj(T) of j-colour nodes (j = 1,2,3). The analysis of the evolution 
of these numbers in the course of the colouring was done by Achlioptas and MolloypHj. 
It is briefly recalled below. 

2.3. Analysis of the symmetric 3-GL heuristic 

In the absence of backtracking, 3-GL terminates as soon as a contradiction occurs, or 
a solution is found. Differential equations can be used to track the evolution of node 
populations as colouring proceeds |15j . In this section we briefly recall how to obtain 
these differential equations, and the associated search trajectories of the heuristic in 
terms of node populations. 

According to the R-property, the probability that an j-colour node is a neighbour 
of the currently coloured node equals c/N throughout the running of the heuristic. The 
probability that the same colour appears in its list is j/3. Therefore the two average 
flows of vertices, w 2 (T) from N 3 (T) to N 2 (T), and Wi(T) from N 2 (T) to Ni(T) are 
cN 3 (T)/N and 2cN 2 (T)/(3N) respectively. Hence, the evolution equations for the 
three populations of vertices read, 

N 3 (T + l) = N 3 (T)-w 2 (T) , 

N 2 {T + 1) = N 2 (T) + w 2 {T) - wi(T) - 8Ni(T) , 

N 1 (T + l) = N 1 (T)+w 1 (T)-(l-5N 1 (T)) . (1) 

where 8Ni(T) = 1 if Ni(T) = (a 2-colour vertex is coloured) and 8Nx(T) = if 
Ni(T) 7^ (a 1-colour vertex is coloured). For c > 1, both N 2 (T) and N 3 (T) are 
extensive in N, and can be written as 

N i (T)=n i (T/N)N + o(N) . (2) 

Apparition of the reduced time, t = T/N, means that population densities rii(T/N) 
change by 0(1) over 0(N) time intervals. To avoid the appearance of contradictions, 
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the number of 1-colour vertices must remain of 0(1) throughout the execution of the 
algorithm. From queueing theory, this requires Wi(t) < 1, that is 

^cn 2 {t)<\ (3) 

which means that 1-colour nodes are created slowly enough to colour them and do 
not accumulate. Thus, in the absence of backtracking, the evolution equations for the 
densities are 

dns(t) , , dno(t) , s , jS 

The solution of these differential equations, with initial conditions 713(0) = 1, ri2(0) = 0, 
is 

n 3 {t) = e~ ct , n 2 (t) = l-t- e~ ct . (5) 

Eqs. (jlj) were obtained under the assumption that n 2 {t) > and hold until t = t 2 aX, 
which the density n 2 of 2-colour nodes vanishes. For t > t 2 , 2-colour vertices do not 
accumulate anymore. They are coloured as soon as they are created. 1-colour vertices 
are almost never created, and the vertices coloured by the algorithm are either 2-, or 
3-colour vertices. Thus, when t 2 < t < 1, n 2 (t) = 0, and n^it) — 1 — t decreases to zero. 
A proper coloring is found at t — 1 i.e. when all nodes have been coloured. 

These equations define the trajectory of the algorithm in phase space in the absence 
of contradictions i.e. as long as condition Q is fulfilled. The trajectory corresponding 
to c = 3 is plotted on Figure |2j For c < cl ~ 3.847, condition ® is never violated, and 
the probability that the algorithm succeeds in finding an appropriate colouring without 
backtracking is positive. The complexity 7(c) N of the algorithm in this regime of c is 
linear with N, and equals the number of nodes in the single branch of the search tree. 

nf{c) = l-\cf dtn 2 (t) , (6) 
3 Jo 

where t* > is the first time (after t — 0) that n 2 (t) becomes 0. 

For c > cl condition © is violated at t — td(c) which depends on c, and 1-colour 
vertices start to accumulate. As a result, the probability for contradictions becomes 
large, and backtracking enters into play. 



3. Study of the 3-Greedy List algorithm with backtracking 

The analytical study of the complexity in the presence of backtracking is inspired from 
previous analysis of random 3-SAT solving with DPLL algorithm [T§1 |2"I"]. 



3.1. Evolution equation for the search process 

In the absence of solution, the algorithm builds a complete search tree before stopping. 
In a complete tree Q + 1 = B, where B is the number of leaves and Q the number 
of nodes. This relation implies that the key for obtaining the complexity Q lies in the 
calculation of B. In order to enable a mathematical analysis of B, we rely on the fact 




Figure 2. Trajectories of dominant search branches generated by the 3-GL in the 
UNCOL phase (c > c 3 ~ 4.7). compared to a search trajectory in the easy COL phase 
(c < cl — 3.85). Horizontal and vertical axis represent the densities ni and n% of 2- and 
3-colour nodes respectively. Trajectories are depicted by solid curves, and the arrows 
indicate the direction of motion (increasing depth of the search tree) ; they originate 
from the left top corner, with coordinates [n^ = 0, n.3 = 1), since all nodes in the initial 
graph are 3-colour nodes. Dots at the end of the UNCOL trajectories (c = 7, 10, 20) 
symbolize the halt point at which condition n% < 31n2/c ceases to be fulfilled, and 
the search tree stops growing ()24|l. Note that as the initial connectivity increases, the 
trajectories halt at earlier stage, implying the early appearance of contradictions as 
the problem becomes over-constrained (large connectivity values). The COL trajectory 
(shown here for c = 3) represents the under-constrained region of the problem, where 
the very first search branch is able to find a proper colouring is found (bottom left 
corner with coordinates (712 = 0, n.3 = 0)). 



that the search tree is complete, and therefore the sequential (depth-first) way in which 
the algorithm builds it is irrelevant to the final structure. In other words, the order in 
which the available colours of a vertex are tried, does not affect the final shape of the 
tree. An identical tree can be built in a parallel (breadth-first) way defined as follows, 
and illustrated in Figure 01 

At time T = 0, the tree reduces to a root node, to which is attached the graph 
to colour, and an attached outgoing edge. At time T, that is, after having coloured T 
vertices of the graphs attached to each branch, the tree is made of B{T) (< 2 T ) branches, 
each one carrying a partially coloured graph. At next time step T — > T+l, a new layer 
is added to the tree by colouring, according to 3-GL heuristic, one more vertex along 
every branch. As a result, at each instant of the parallel process, branches either die 
(encounter a contradiction), keep growing (a 1-colour vertex is coloured), or split (a 2- 
colour vertex is chosen and its two available colours are tried simultaneously) (FigureEJ). 
This parallel growth process is Markovian, and can be encoded in an instance-dependent 
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(and exponentially large in N) evolution matrix H |24j . 

To obtain a tractable expression for H, we neglect correlations arising from the 
choice of the same vertex in two different branches. After assigning T variables, each 
branch represents a different sequence of T coloured vertices, which determines the 
values {iVi, N 2 , N 3 } attached to this branch. Denoting by B(Ni, N 2 , N 3 ; T) the number 
of branches at time T with iVj (i = 1,2,3) i-colour vertices, the growth process of 
the search tree can be described by the evolution of B(Ni, N 2 , N 3 ; T) with time. This 
evolution is given by 

00 

B(N U N 2 , N 3 ; T + 1) = £ H(N U N 2 , N 3 , N[, N' 2 , N' 3 - T) B(N[, N' 2 , N' 3 - T), (7) 

N[,N' 2 ,N' 3 =0 

where 

N ' fN'\ r r 
H(Nx, N 2 , N 3 , N[, N 2 , N' 3 ; T) — £ 8 (^(l - -rp)^,-^ x 

(! - s n[) 5^ ywJ^SN^ 1 ^ 1 ~ ^ N2 ' W16n ^- n 2-(^-w 1 )Sn 1 -n[- Wi +i+ 

25 K ^ 2 Wi ^sN^ 1 ^ 1 ~ ^) N ''' Wi ' 16 n 2 -n 2 -(w 2 -w 1 -i)Sn 1 -n' 1 -w 1 ^ (8) 

is the branching matrix of the 3-GL algorithm, and SN is the Kronecker delta function. 
The matrix describes the average number of branches with {iVj}? =1 i-colour vertices, 
which are coming out from branches with {N-}? =1 z-colour vertices, as a result of all 
the colouring options of the vertex coloured at time T. The R-property is responsible 
for the binomial distributions of the flows W\ and w 2 in (jHJ). Note that (jHJ) is written 
under the assumption that no 3-colour nodes are chosen by the algorithm throughout 
the growth process. This assumption is consistent with the resultant solution which 
shows that in the uncolourable (UNCOL) region, n 2 (t), namely the number of 2-colour 
vertices divided by N, keeps positive for alH > 0. 

3.2. Resolution of the evolution equation 

In order to obtain the complexity from the evolution equation of B(N; T) (J7J), we define 
the generating function B(y; T) of B(N; T) to be 

B(y;T) = J2^My*N)B(N;T), y = ( Vl , y 2 , y 3 ), N = (N l7 N 2 ,N 3 ). (9) 

N 

Plugging ()8l9j) into Q yields the following evolution equation for the generating function 

B(y; T + 1) = B(g(y); T) + (2 e^ 2 - e^ 1 ) B( - 00, g 2 (y), g 3 (y); T), (10) 
where 



9i(y) = Vi 
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Figure 3. Imaginary, breadth-first growth process of a search tree associated to an 
UNCOL graph and used in the theoretical analysis. T denotes the depth in the tree, 
that is the number of nodes coloured along each branch. At depth T, one node is chosen 
on each branch among 1-colour vertices if any (grey circles), or 2-, 3-colour (splitting, 
black circles). If a contradiction occurs as a result of 1-colour node colouring, the 
branch gets marked with C and dies out. The growth of the tree proceeds until all 
branches carry C leaves. The resulting tree is identical to the one built through the 
usual, sequential operation of the 3-GL algorithm. 



92(y) = y2 + §j(e^-i) , 

93(y)=ys + ^(e y >- y3 -i) . (ii) 

To solve pOjl . we make scaling hypothesis for B and B in the large N limit [IHJ- Let us 
examine how a step of the algorithm affects the size of the three populations N±, N2, N3. 
Since the average connectivity is 0(1) i.e. each vertex is connected on average only 
to 0(1) vertices, when a vertex is coloured, the number of vertices whose status (the 
number of available colours) is subsequently changed is bounded by the number of 
neighbors of the coloured vertex. Hence a reasonable assumption is that the densities 
Hi = Ni/N change by 0(1) after T = t x N vertices are coloured. The corresponding 
Ansatz for the number of branches is, 

B(N;T) = e A M n i> n 2i n 3;i)+o(7V) ^2) 

where non-exponential terms in N depend on the populations of i-colour nodes (i = 
1, 2, 3). From (fT2"|) and (jHJ) we obtain the following scaling hypothesis for the generating 
function B, 

B(y; T) = e ^ v{yi,v^m;t)+o{N) ^ ^ 

where <p(y;t) is the Legendre transform of u(n;t), the logarithm of the number of 
branches B(N; T) divided by N, 

ip(y; t) = max [u;(n; t) + y ■ n] 

n 

u{n\ t) = min [<p(y; t) — y ■ n] (14) 

y 
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where n = (n 1; n 2 , n 3 ). y and n are conjugated Legendre variables; in particular, the 
typical fraction of i-colour nodes at depth t are given by the derivatives of (p at vanishing 
argument, 



ni(t) 



0;t) 



(15) 



At the initial stage of the tree building up, there is a single outgoing branch from the 
root node, carrying a fully uncoloured graph. Thus, B(N; T = 0) = 1 if N = (0, 0, N), 
otherwise, and B(y,T = 0) = e NyA . The initial condition for function ip is simply, 

( p(y;t = 0) = y 3 . (16) 

According to (J2J) both N 2 (T) and N 3 (T) are extensive in N; hence n 2 > and n 3 > 0. 
Conversely, as soon as Ni(T) becomes very large, contradictions are very likely to occur, 
and the growth process stops. Throughout the growth process, Aq = 0(1) almost surely. 
Thus ni = with high probability, and p does not depend upon y x from (fT3j) . 

Independence of p on y\ allows us to choose the latter at our convenience, that is, 
as a function of y 2 , y%, t. Following the so-called kernel method [22], we see that equation 
(jlOJI simplifies if y\= y 2 — In 2. Then, from Ansatz (|13jh we obtain the following partial 
differential equation (PDE), 



^(y 2 , y 3 ; t) = -y 2 + \n2- ~|% 2 , y 3 ; t) + c (e« - ^J^ 2 ' £ ) 
The solution of PDE (fTTj) with initial condition (fT7)|) reads 



p(y 2 ,y 3 ;t) = U 2 -^t + (l 



t)(y 2 -In 2)+ In 



3+ e- 2ct/3 [2e m - y2 -3 



(18) 



c?.,?. Growth process of the search tree 

PDE f!13|) can be interpreted as a description of the growth process of the search tree 
resulting from the algorithm operation. Using Legendre transform (fT^j) . PDE (|T3*|) can 
be written as an evolution equation for the logarithm u(n 2 , n 3 ; t) of the average number 
of branches with densities n 2 ,n 3 of 2- ,3-colours nodes as the depth t = T/N increases, 



dcu du c 

— = h In 2 - - n 2 + c n 3 

at on 2 3 



du du 



exp 



(19) 



v dn 3 dn 2 J 

The surface oo, growing with "time" t above the plane n 2 ,n 3 describes the whole 
distribution of branches. Here, this distribution simplifies due to nodes conservation. 
The sum n 2 + n 3 of 2- and 3-colour nodes densities necessarily equals the fraction 1 — t 
of not-yet coloured nodes. Therefore, uj is a function of n 3 and t only, whose expression 
is obtained through inverse Legendre transform of (|THj) . 



u(n 3 ;t) = - t (1 
o 



2 1 — 4 n 3 ) — n 3 In n 3 — (1 



l-t-n 3 )ln2 + (l-n 3 ) In 



3 1 



n 3 ) ln(l 



n 3 ) 



(20) 



Figure E| exhibits u;(n3,t) for c = 10. 
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Figure 4. Function u> (log. of number of branches with densities ri2 = 1 — t — ri3, 
of 2- and 3-colour nodes at depth t in the search tree) as a function of and t for 
c = 10. The top of the curve at given time t, ui*(i), is reached for the dominant branch 
3-colour density n^(t). The evolution of u> is shown till t = % at which dominant 
branches in the search tree stop growing (die from the onset of contradictions). The 
maximal u at th, u>*(th), is our theoretical prediction for the complexity. 



The average number of branches at depth t in the tree equals 

B(t) = f dn 2 dn 3 e^ 1 " 2 -" 31 '' ~ e NuJ * (t) , (21) 



where 



C ,2 c 



u*(t) = -r -~t-(l-t) In 2 + ln 
6 3 



e -2ct/3 



(22) 



is the maximum over n 2 ,n s of uj(n2,n 3 ;t) reached in n\ (t ) , rig (t) . In other words, the 
exponentially dominant contribution to B{t) comes from branches carrying partially 
coloured graphs with densities 

n lV) = 3e 2 C ?/ 3 _ 1 ' n* 2 (t) = l-t-n;(t) . (23) 
Under the action of the 3-GL algorithm, initially random 3-colouring instances become 
random mixed 2&3-colouring instances, where nodes can have either 2 or 3 colours 
at their disposal. This phenomenon indicates that the action of the 3-GL algorithm 
on random 3-colouring instances can be seen as an evolution in the n 3 phase-space 
(Figure |2J)- Each point {n2,n 3 ) in this space, represents a random mixed 2&3-colouring 
instance, with an average number (n 2 + n s )N of nodes, and a fraction n 3 /(n 2 + n s ) of 3- 
colour nodes. Parametric plot oi n 2 {t) , n^(t) as a function of t represents the trajectories 
of dominant branches in Figure El 

The search tree keeps growing as long as no contradictions are encountered i.e. 
as long as 1-colour vertices do not accumulate. This amounts to say that dominant 
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branches are not suppressed by contradictions and become more and more numerous 
through 2-colour nodes colouring, 

£(.)>«> , (24) 

or equivalently from (jl9j) . n%(t) < 31n2/c. This defines the halt condition for the 
dominant branch trajectories in the ri2, n% dynamical phase diagram of Figure 121 Call th 
the halt time at which condition (J24|) gets violated. The logarithm w*(th) of the number 
of dominant branches at t = th, when divided by In 2, yields our analytical estimate for 
the complexity of resolution, \yiQ/N.\ 

3.4- Comparison with numerical experiments 

To check our theory, we have run numerical experiments to estimate u, the logarithm of 
the median solving time, as a function of the initial graph degree c. Figure describes 
the output of these simulations. The easy-hard-easy pattern of the GC problem when 
passing from the COL (c < c 3 ) to the UNCOL (c > C3) regions is clearly visible, with 
an exponential scaling of hardness around the critical connectivity. 

Table 1 presents results for w as a function of the connectivity c in the UNCOL 
phase as found from numerical experiments and from the above theory. Note the 
significant decrease in the complexity as the initial connectivity increases. Extrapolation 
of numerical results to the large iVlimit is described in the Inset of Figure El For c = 7, 
the agreement between numerical and analytical results is not perfect. However, the 
high computational complexity of the algorithm for small c values, does not allow us 
to obtain numerical results for large sizes N, and affect the quality of the large N 
extrapolation of uo. 

In the UNCOL region, as c increases, contradictions emerge in an earlier stage of the 
algorithm, the probability that the same vertex appears in different branches reduces, 
and the analytical prediction becomes exact. As a consequence of the early appearance 
of contradictions, the complexity u decreases with c. At very large c, we find 

3 In 2 1 ^ 1.040 

and therefore that the (logarithm of the) complexity exhibits a power law decay with 
exponent 2 function of connectivity c. 



w(c)x — 3^^- , (25) 



4. Summary and Discussion 

In this study we have presented an analysis of the complexity of the 3- Greedy List 
(GL) algorithm acting onto uncolourable (UNCOL) random-graph instances. This 

\ Let us stress that our calculation is approximate. First, as mentioned above, correlations between 
different branches have been neglected. Secondly, (p is the Legendre transform of u> over non-negative 
values of only, a constraint we have not taken into account in the growth PDE l|17|) ■ We expect our 
prediction to be accurate when TI2 and 713 are not getting to small throughout the growth process i.e. 
for large enough connectivites c. 
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Figure 5. Simulation results exhibiting the easy-hard-easy pattern which characterizes 
the complexity of the GL-algorithm. The curves describe the median complexity as 
a function of connectivity, as measured for N = 100 (solid-line), N — 125 (long- 
dashed line) and N — 150 (dotted-line), averaged over 1,000 samples. The arrow 
denotes the location of critical connectivity C3 ~ 4.7 separating COL (left) from 
UNCOL (right) phases. Running times T scale exponentially in the UNCOL phase, 
T ~ 2 Nu . Calculation of w as a function of the connectivity c in the UNCOL phase 
is the purpose of the present work. Inset: polynomial fits (solid lines) to simulation 
results of luhis — ^og 2 N/(2N) vs. 1/N for three different connectivity values c = 7 
(circles) c = 10 (squares) and c = 15 (triangles). The fits are to whis~1°92N/(2N), to 
account for the non-polynomial finite-size corrections to our saddle point calculation. 
Extrapolations of the fits to the y-axis are our estimates for 10 at N — > 00, and appear 
in the second column of Table Note that due to high computational times, results 
for c = 7 have been obtained for sizes up to N — 500 only, and therefore provide a less 
accurate estimate of to. 



c 


UJTHE 




UNOD 


20 


2.886 * 10 


-3 


2.568* 10" 3 ±5.85* 10" 4 


3.038 * 


10~ 3 ±3.2*10~ 


4 


15 


5.255 * 10 


-3 


4. * 10- 3 ± 7.09 * 10~ 4 


5.776 * 


10~ 3 ±4.79*10 


-4 


10 


1.311 * 10 


-2 


1.371 * 10~ 2 ± 1.1 * 10~ 3 


1.492 * 


10~ 2 ±9.6*10- 


4 


7 


2.135* 10 


-2 


2.879 *10~ 2 ± 6.8 *10" 3 


3.091 * 


10" 2 ±3.6* 10" 


3 



Table 1. Analytical results and simulation results of the complexity uj for different 
connectivities c in the UNCOL phase. The analytical values of lot he are derived 
from theory; loris is obtained by measuring the maximal number of branches in the 
histogram of branch lengths [T5], and to nod through direct measure of the search tree 
size. 



analysis provides an estimate of the typical performances of the GL algorithm. Above 
the colourability threshold C3, proving the absence of colouring takes a time growing 
exponentially with the size N of the graph. However, well above the threshold i.e. for 
graph connectivities c > C3, instances are strongly over-constrained, and the absence 
of proper colouring is established more and more quickly. Complexities in this region, 
though exponential with N, have a very small prefactor which for large values of c 
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vanishes with a power law behaviour (00(c) x 1.040/c 2 ). 

The present study could be pursued in many directions. First, it would be 
interesting to analyse the performances of the 3-GL algorithm in the colourable (COL) 
phase c < c 3 . Graphs with low connectivities (c < cl) being almost surely coloured in a 
time growing linearly with their size [SB], the interesting region is the intermediate 
range of connectivities, cl < c < c 3 . There, proper colourings are found at the 
cost of an exponential computational effort, which could in principle be quantitatively 
characterized along the lines of Ref . ^Hj • Secondly, another interesting extension would 
be to focus on other search heuristics. Attractive candidates are heuristics that favor 
high-degree vertices. Such a procedure has been recently analyzed (in the absence of 
backtracking) to improve rigorous lower bounds to the COL-UNCOL threshold C3 j!5j . 
Last, the study of more realistic e.g. finite dimensional graph distributions, could aid 
in the understanding of the influence of instance structure on resolution complexity. 

As stated in the introduction, the main interest of 3-COL with respect to other 
NP-complete problems e.g. SAT lies in its global gauge symmetry. The 3-GL heuristic 
we concentrated on here does not break this symmetry in that it treats on a equal 
foot all 2-colours nodes when a split has to made, irrespectively of their attached list 
of available colors e.g. {R, G}, {R,B} or {G, B}. It is easy to design heuristics that 
explicitly violates this symmetry and preferentially colour nodes with R if possible. The 
analysis of the computational performances of backtracking algorithms based on such an 
asymmetric heuristic is technically quite difficult, and will be presented in a forthcoming 
work [22] • 

A promising outcome of the present work is the relative technical simplicity of 
our 3-GL analysis with respect to the corresponding studies of DPLL on random SAT 
instances. The growth partial differential equation monitoring the evolution of the 
search tree could be solved exactly, in contradistinction with previous studies of the 
SAT problem. This essentially comes from a simple conservation law, the sum of the 
numbers of coloured and uncoloured nodes remaining of course constant throughout the 
search, and makes 3-GL with backtracking a good candidate for future rigorous studies 
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